PSCI 2270 - Week 7
Department of Political Science, Vanderbilt University
October 10, 2023
Theories are made up of concepts (nodes):
Concepts are latent:
Indicators are concrete:
Important to consider how do we construct indicators
Sometimes there is slippage between latent concept and proxy, e.g.
Important to make measurement as unobtrusive as possible
Reliability:
Validity:
Document analysis: Use of any audio, visual, or written materials as a source of data
Interview data: Data that are collected from responses to questions posed by the researcher to a respondent
Firsthand observation: Data that may be collected by making observations in a field study or in a laboratory setting
Geographic Information Systems (GIS) are being applied with increasing frequency, and with increasing sophistication, in international relations and in political science more generally. Their benefits have been impressive: analyses that simply would not have been possible without GIS are now being completed, and the spatial component of international politics—long considered central but rarely incorporated analytically— has been given new emphasis. However, new methods face new challenges, and to apply GIS successfully, two specific issues need to be addressed: measurement validity and selection bias. Both relate to the challenge of conceptualizing nonspatial phenomena with the spatial tools of GIS. Significant measurement error can occur when the concepts that are coded as spatial variables are not, in fact, validly measured by the default data structure of GIS, and selection bias can arise when GIS systematically excludes certain types of units. Because these potential problems are hidden by the technical details of the method, GIS data sets and analyses can sometimes appear to overcome these challenges when, in fact, they fail to do so. Once these issues come to light, however, potential solutions become apparent—including some in existing applications in international relations and in other fields.
Vector data: Points, lines, and polygons to describe spatial features: a point for a feature at a single location, a line for a linear feature such as a road, or a polygon for a feature that covers a definable spatial area.
Raster data: Pixels, predefined equivalent-sized units that are then assigned a value for a single variable across the entire area covered by the data.
Stasavage, David. 2011. States of Credit: Size, Power, and the Development of European Polities. Princeton, NJ: Princeton University Press.
Starr, Harvey. 2013. On Geopolitics: Space, Place, and International Relations. Boulder, CO: Paradigm.
Cederman, Lars-Erik, Kristian Skrede Gleditsch, and Halvard Buhaug. 2013. Inequality, Grievances, and Civil War. New York: Cambridge University Press.
Measurement validity:
Selection bias:
What kind of geolocation or spatial data can we use to study determinants of protest activity?
Text has always been an important data source in political science. What has changed in recent years is the feasibility of investigating large amounts of text quantitatively. The internet provides political scientists with more data than their mentors could have imagined, and the research community is providing accessible text analysis software packages, along with training and support. As a result, text-as-data research is becoming mainstream in political science. Scholars are tapping new data sources, they are employing more diverse methods, and they are becoming critical consumers of findings based on those methods. In this article, we first describe the four stages of a typical text-as-data project. We then review recent political science applications and explore one important methodological challenge—topic model instability—in greater detail.
Obtaining text
From text to data
Quantitative analysis of text
Evaluating performance
Classification: Unsupervised machine learning methods compare the similarity of documents based on co-occurring features
Scaling: Use texts to locate political actors on ideological space
Text Reuse: Explicitly value word sequencing in judging document similarity
Natural Language Processing: Moving from “whom?” to “who did what to whom?”
Measurement reliability:
Measurement validity:
Selection bias:
What kind of text data can we use to study determinants of protest activity?